Efficient Two-Stage Genome-Wide Association Designs Based on False Positive Report Probabilities

نویسنده

Peter Kraft

چکیده

Despite recent advances, very-high-throughput (VHT) technologies capable of genotyping hundreds of thousands of SNPs in individual samples remain prohibitively expensive for the large studies necessary to screen substantial sections of the genome for variants with modest effects on disease risk. This paper presents a two-stage strategy, where a portion of available samples are genotyped with VHT technology, and a small number of the most promising variants are genotyped with standard high-throughput techniques in the remaining samples as an independent replication study. The sample sizes in the first and second stages and the corresponding significance levels are chosen to limit False Positive Report Probability (FPRP), while maximizing the number of Expected True Positives (ETPs). (The FPRP is the conditional probability that a marker is not truly associated with disease, given the a significant test for disease-marker association.) For a fixed budget, the two-stage strategy has greater power (a larger number of ETPs) than the single-stage strategy (where all subjects are genotyped using expensive VHT technology). Furthermore, concentrating on the FPRP leads to considerable savings relative to strategies designed to control the family-wise error (e.g. Bonferonni correction). The FPRP and number of ETPs can also accommodate researchers' prior beliefs about the number of causal loci and the magnitude of their effects. The expected number of false positives does not change if the true number and effects of causal loci differs from the specified prior (although the false discovery rate will vary), thus limiting the absolute amount of resources spent chasing "false leads."

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Programs for calculating the statistical powers of detecting susceptibility genes in case–control studies based on multistage designs

MOTIVATION A two-stage association study is the most commonly used method among multistage designs to efficiently identify disease susceptibility genes. Recently, some SNP studies have utilized more than two stages to detect disease genes. However, there are few available programs for calculating statistical powers and positive predictive values (PPVs) of arbitrary n-stage designs. RESULTS We...

متن کامل

Comparative analysis of different approaches for dealing with candidate regions in the context of a genome-wide association study

Genome-wide association studies (GWAS) test hundreds of thousands of single-nucleotide polymorphisms (SNPs) for association to a trait, treating each marker equally and ignoring prior evidence of association to specific regions. Typically, promising regions are selected for further investigation based on p-values obtained from simple tests of association. However, loci that exert only a weak, l...

متن کامل

Optimal two-stage genome-wide association designs based on false discovery rate

Genome-wide association studies are likely to be conducted in large scale in the near future. In such studies, searching over hundreds of thousands of markers for the few ones that are associated with disease brings out the multiple-hypothesis testing problem in its severe form. We explore, in a two-stage design, how the use of false discovery rate (FDR) can alleviate the burden of a prohibitiv...

متن کامل

Optimal designs for two-stage genome-wide association studies.

Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in s...

متن کامل

Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره شماره

صفحات -

تاریخ انتشار 2006

Efficient Two-Stage Genome-Wide Association Designs Based on False Positive Report Probabilities

نویسنده

چکیده

منابع مشابه

Programs for calculating the statistical powers of detecting susceptibility genes in case–control studies based on multistage designs

Comparative analysis of different approaches for dealing with candidate regions in the context of a genome-wide association study

Optimal two-stage genome-wide association designs based on false discovery rate

Optimal designs for two-stage genome-wide association studies.

Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

عنوان ژورنال:

اشتراک گذاری